92 research outputs found

    Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube

    Full text link
    Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small area. Although several studies have taken advantage of the novel architecture of HMC, its characteristics in terms of latency and bandwidth or their correlation with temperature and power consumption have not been fully explored. This paper is the first, to the best of our knowledge, to characterize the thermal behavior of HMC in a real environment using the AC-510 accelerator and to identify temperature as a new limitation for this state-of-the-art design space. Moreover, besides bandwidth studies, we deconstruct factors that contribute to latency and reveal their sources for high- and low-load accesses. The results of this paper demonstrates essential behaviors and performance bottlenecks for future explorations of packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-

    Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Full text link
    Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits such as high bandwidth and memory-level parallelism. However, the implications of NoCs on the characteristics of 3D-stacked memories in terms of memory access latency and bandwidth have not been fully explored. This paper addresses this knowledge gap by (i) characterizing an HMC prototype on the AC-510 accelerator board and revealing its access latency behaviors, and (ii) by investigating the implications of such behaviors on system and software designs

    Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

    Full text link
    Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization -- the classic challenge of sparse problems -- their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table

    HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

    Get PDF
    Research areas: Computer architecture, Programming analysisHeterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical decisions to improve performance. In this paper, we propose a profiler, HPerf, to identify an efficient task distribution on CPUs+GPUs system with low profiling overhead. HPerf is a hierarchical profiler. First it performs lightweight profiling and then if necessary, it performs detailed profiling to measure caching and data transfer cost. Compared to a brute-force approach, HPerf reduces the profiling overhead significantly and compared to a naive decision, HPerf improves the performance of OpenCL applications up to 25%

    Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial

    Get PDF
    Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials. Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure. Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen. Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049
    corecore